OPEN 3 ACCESS Freely available online 



•0-PLOS I o-^E 



Genetic Signature of Histiocytic Sarcoma Revealed by a (a\ 
Sleeping Beauty Transposon Genetic Screen in Mice cros^rk 

Raha A. Been^'^'^ Michael A. Linden^, Courtney J. Hager\ Krista J. DeCoursin\ Juan E. Abrahante^'^, 
Sean R. Landman^, Michael Steinbach^, Aaron L. Sarver\ David A. Largaespada^'^, Timothy K. Starr^'^'^* 

1 Masonic Cancer Center, University of Minnesota, Minneapolis, Minnesota, United States of America, 2 College of Veterinary Medicine, University of Minnesota, St. Paul, 
Minnesota, United States of America, 3 Department of Comparative and Molecular Biosciences, University of Minnesota, St. Paul, Minnesota, United States of America, 
4 Obstetrics, Gynecology, and Women's Health, University of Minnesota, Minneapolis, Minnesota, United States of America, 5 Department of Laboratory Medicine and 
Pathology, University of Minnesota, Minneapolis, Minnesota, United States of America, 6 Department of Genetics, Cell Biology, and Development, University of Minnesota, 
Minneapolis, Minnesota, United States of America, 7 Department of Computer Science and Engineering, University of Minnesota, Minneapolis, Minnesota, United States of 
America 



Abstract 

Histiocytic sarcoma is a rare, aggressive neoplasm that responds poorly to therapy. Histiocytic sarcoma is thought to arise 
from macrophage precursor cells via genetic changes that are largely undefined. To improve our understanding of the 
etiology of histiocytic sarcoma we conducted a forward genetic screen in mice using the Sleeping Beauty transposon as a 
mutagen to identify genetic drivers of histiocytic sarcoma. Sleeping Beauty mutagenesis was targeted to myeloid lineage 
cells using the Lysozymel promoter. IVlice with activated Sleeping Beauty mutagenesis had significantly shortened lifespan 
and the majority of these mice developed tumors resembling human histiocytic sarcoma. Analysis of transposon insertions 
identified 27 common insertion sites containing 28 candidate cancer genes. Several of these genes are known drivers of 
hematological neoplasms, like RafI, Fill, and Mitf, while others are well-known cancer genes, including Nfl, Myc, Jak2, and 
Pten. Importantly, several new potential drivers of histiocytic sarcoma were identified and could serve as targets for therapy 
for histiocytic sarcoma patients. 
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Introduction 

Histiocytic sarcoma (HS) is classified as a neoplastic prolifera- 
tion with features of liistiocytes/macrophages[l]. HS has also been 
called true histiocytic lymphoma or malignant histiocytosis, but 
these terms have been discontinued. Before 1990, the majority of 
patients diagnosed with HS were misdiagnosed due to a lack of 
antibodies specific for the histiocytic lineage. Retrospective 
analysis indicated the majority of these patients actually had B- 
or T-cell lymphomas [2-5] . Case studies have demonstrated that 
HS can occur in isolation or in the context of other hematological 
malignancies, such as B-ceU lymphoma, to which the HS is 
sometimes clonaUy related [4]. HS may thus, in some cases, 
develop via trans-differentiation from a malignant, or premalig- 
nant, lymphoid neoplasm. HS is rare, with an incidence far less 
common than the non-Hodgkin lymphomas [1,6]. Typically, 
patients present with advanced clinical disease and have a poor 
prognosis[l,4,5]. Since the genetic etiology of HS is largely 
unknown, HS is difficult to manage clinically and there is no 
standard therapy for patients with HS. 

Currently, no precursor lesions or etiologic agents have been 
described for human HS[7]. Two cytogenetic case studies 



identified gains in chromosome 8 in human HS[8,9], implicating 
MYC as a HS oncogene. Animal models have identified possible 
driver genetic lesions. ArrayCGH performed on over 100 canine 
HS samples revealed an average of 30 copy number alterations per 
tumor [10], while a genome wide association study in Bernese 
Mountain Dogs identified a strong association between HS and 
the MTAP-CDKN2A locus [11]. Ptm and M4a^^ are also 
implicated, as compound heterozygous mice develop HS and 
60% of human HS examined for protein expression show a loss of 
PTEN, pl6'^^*^ or pl4'^''[12]. Several odier genetic mouse 
models have produced HS including Dokl /Dok2/Dok3 triple 
knockout anrmals[13], Cjplbl knockout mice[14], p21 knockout 
mice [15], and pi /Box mutant mice [16]. In addition, 50% of 
Cdkn2a deficient mice infected with Moloney murine leukemia 
virus developed HS, which was frequently accompanied by 
lymphoma [17]. 

To identify genetic drivers of HS we performed an unbiased 
forward genetic screen in mice using the Sleeping Beauty (SB) 
transposon as an rnsertional mutagen[18-20]. SB is capable of 
both activating proto-oncogenes and inactivating tumor suppres- 
sor genes and has been used to identify genetic drivers in a variety 
of cancers [2 1-32]. In this study we activated SB mutagenesis using 
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the Lysozyme2 {Lyz2) promoter in a cohort of mice resuhing in early 
mortality and a large percentage of mice developing HS. Analysis 
of transposon CISs identified 28 genes, including 2 miRNAs 
associated with HS. Several of these genes are known oncogenes 
and tumor suppressors including Nfl, Pten, Myc and Flil, while 
many others have not been direcdy associated with cancer and 
could be potential targets for therapy. 

Methods and Materials 

Ethics Statement 

All mice were bred, cared for and euthanized in accordance 
with the National Institutes of Health Guidelines for the Care and 
Use of Laboratory Animals. All experiments were approved by the 
University of Minnesota Institutional Animal Care and Use 
Committee (Protocol # 0901A56501). 

Transgenic Mice 

lyz-Cre mice were obtained from Jackson Laboratories (Strain 
name: B6.mF2-Lyz2tml (cre)Ifo/], Cat # 004781) [33]. These 
mice were created using a knock-in allele that has a nuclear 
localized Cre recombinase cDNA inserted into the first coding ATG 
of the lyz2 gene. This allele abolishes endogenous Iyz2 gene 
function and places NLS-Cre expression under the control of the 
endogenous Lyz2 promoter/enhancer elements. 

Rosa26-LsL-SBll mice backcrossed to C57BL/6J were a 
generous gift from Adam Dupuy (University of Iowa). These mice 
were described pre\'iously[22]. 

Three strains of T2/0nc transgenic mice were used. The first 
two strains, T2/ Onc(chrl) and T2/0nc(chrl5), contained roughly 25 
transposons resident as a concatamer on mouse chromosomes 
(MMU) 1 and 15, respectively [19]. The third strain, T2/ 
Onc2(chr4), contained roughly 214 transposons resident as a 
concatamer on MMU 4 [20]. 

Genotyping and Excision PGR 

We isolated tail biopsy DNA using a standard phenol- 
chloroform extraction method. PCR was performed using primer 
sequences for each transgene. Primer sequences are as follows: 
LyzM-Cre: primer oIMR3066 5'- CCCAGAA.\TGCCAGAT- 
TACG- 3', primer oIMR3067 5'- CTTGGGCTGCCA- 
GAATTTCTC-3', primer oIMR3068 5'- TTACAGTCGGC- 
CAGGCTGAC-3'; T20nc or T20nc2: Forward 5'- 
CGCTTCTCGCTTCTGTTCGC-3', Reverse 5'- CCACCCC- 
CAGCATTCTAGTT-3'; LsL-SBll: Wild-type Forward 5'- 
GGAGGGGAGTGTTGCAATACCTTT-3'; Wild-type Reverse 
5'- AACTCGGGTGAGCATGTCTTTAATCTAC-3'; Trans- 
genic Forward 5'-GGCATTGGGGGTGGTGATATAAACT- 
3'; and T20nc Excision PCR was performed as previously 
described[19] using primer sequences: Forward 5'- 
GGGATGTGCTGCAAGGCGAT-3'; Reverse 5'- CAAGC- 
TATGCATCCAACGCGTT-3 ' . 

PGR analysis of VDJ rearrangement at Tab and Igh loci 

DNA was isolated from eight representative tumors and control 
tissues from wild-type animals. For Tcrb analysis, two forward 
primers in the V locus and one forward primer in the D locus were 
used in conjunction with a reverse primer in the J locus. For IgH 
analysis, two forward primers in the D locus were used with a 
reverse primer in the J locus. Primer sequences are as follows: 
Vb8.2 5'-CTACCCCCTCTCAGACATCA-3', Jb2 5'-TGA- 
GAGCTGTCTCCTACTATCGATT-3', Vbll.5 5'-TGCT- 
GGTGTCATCCAAACACCTAG, Db2 5'-GTAGGCACCTG- 
TGGGGAAGAAACT-3', VHQ52 5'-CGGTACCAGACT- 



GARCATCASCAAGGAC-3', VH7183 5'-CGGTACCAAGAA- 
SAMCCTGTWCCTGCAAATGASC-3', JH3 5'-GTCTA- 
GATTCTC ACAAGAGTCCGATAGACCCTGG-3 ' . 

Kaplan-IVleier Analysis 

Survival was examined using a Kaplan-Meier curve (Prism 
Software, Graph Pad) and statistically analyzed using the logrank 
test controlling for multiple comparisons through the Sidak 
method[34]. 

Histopathology and Immunohistochemistry 

Mice were necropsied when moribund or at 1 .5 years of age, 
whichever came first. Lungs, heart, lymph nodes, spleen, pancreas, 
sternum and all abnormal tissues were removed and visually 
inspected for macroscopic tumors. Tissues were either fixed in 
10% formalin or snap frozen in liquid nitrogen. Formalin-fixed 
samples underwent standard tissue processing, were paraffin- 
embedded, mounted and sectioned at 5 |J,m. Sections were 
adhered to glass slides by heat frxation. Slides were processed 
and stained with hematoxylin-eosin (HE). Immunohistochemistry 
was conducted with citrate-based antigen retrieval. Tissues were 
stained with antibodies for Mac2 and F4/80 (Cedarlane, 
Burlington, NC clones M3/38 and CI:A3-1), Lyz and CD3e 
(Dako, Carpinteria CA polyclonal), and Pax5 (Santa Cruz, Santa 
Cruz CA). Tissues were analyzed by a board-certified pathologist 
(ML, American Board of Pathology). 

Transposon insertion analysis 

Genomic DNA was isolated using standard phenol-chloroform 
extraction and ethanol precipitation. DNA was subjected to linker- 
mediated PCR as previously described [2 3], except that primer 
sequences were changed to include 12 bp barcodes and lUumina 
HiSeq 2000 platform-specific sequences (sequences available upon 
request). PCR amplicons were subjected to sequencing using the 
lUumina HiSeq 2000 platform following manufacturer's protocol. 

Sequences were mapped to the mouse genome using BOW- 
TIE[35] using the TAPDANCE[36] bioinformatics pipeline. 
TAPDANCE identifies CISs based on analysis of varying genomic 
window sizes, tested for significance using the Poisson distribution 
(p<0.05) utilizing a Bonferroni correction based on number of 
windows examined. Based on the 1,575 unirjue rf;gions, 3 
insertions in an 8.9 KB window or 4 or more insertions within a 
263 KB window were considered a CIS. 

qRT-PCR 

RNA was extracted from 5 mg tissue with the RNeasy Minikit 
(Qiagen, Valencia, CA, USA). Tissues corresponded to match 
normal/tumor samples from liver and spleen. RNA concentra- 
tions were determined in an Epoch spectrophotometer system 
(BioTek, Winooski, VT, USA). 1 ug of RNA was converted to 
cDNA with the ABI High-Capacity cDNA Reverse Transcription 
Kit (t^4368814) according to manufacturer conditions. Gene 
specific primers were designed from sequences retrieved from 
Genbank using Primer 3 v4.0 (http://frodo.wi.mit.edu/primer3/). 
AH primer sequences are available upon request. Quantitative 
(q)PCR was carried out in an ABI 7500 system in triplicate using 
the FastStart Universal SYBR Green Master (Roche, Indianapolis, 
IN, USA) in 20 ul reactions containing 250 nm (Final concentra- 
tion) for each forward and reverse primer, 5 ul of cDNA diluted 
mix (-25 ng) and 10 ul of 2X SYBR Master Mix. Cycles 
parameters consisted in an initial denaturation step at 95° for 
10 min followed by 40 cycles of amplification at 95° for 15 s and 
60° for 1 min, and a dissociation step. 
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Association Analysis 

Frequent itemset mining was performed to find groups of 
insertion regions, including larger sets of three or more regions, 
which frequendy co-occur in the tumors. This represents a branch 
of data mining that originates from the analysis of market basket 
transaction data. More specifically, frequent itemset mining is a 
methodology that can efficiendy determine items that are 
frequently purchased together from a binary transaction matrix, 
in which rows represent different transactions by customers at a 
store, columns represent the different items available, and entries 
in the matrix indicate whether or not that item was purchased in 
that transaction[37,38]. For our purposes, tumors act as transac- 
tions (rows), genes are the items (columns), and frequent itemset 
mining is used to determine which sets of genes co-occur together 
across multiple tumors. 

The set of unirjuc' insc'rtion regions produced by TAP- 
DANCE[36] were transformed into a set of genes by mapping 
each insertion region to its nearest gene. Each tumor is then 
represented by a set of genes containing at least one insertion in 
that tumor, which forms a binary transaction matrix in which rows 
are tumors and columns are affected genes. Closed frequent 
itemsets (a condensed form of frequent itemset results) were then 
extracted from the transaction matrix using an apriori-based 
algorithm to produce a list of candidate gene patterns[39-41]. This 
algorithm was run with a support threshold of three, meaning that 
only gene patterns that co-occur in three or more tumors are 
considered. Some support counts were then modified to reflect the 
number of unique mice that had the gene pattern, rathc-r than the 
number of tumors. This is to correct for similar insertion sets in 
tumors originating from the same mouse. 

A p-value is calculated for each candidate gene pattern by 
modeling the support of the pattern as the test statistic. The nuU 
distribution is modeled as a binomial with the number of trials 
equal to the number of tumors and the probability of success equal 
to the joint probability of the individual genes in the patterns 
occurring together (based on their individual frequencies in the 
dataset). In order to account for multiple hypotheses testing, the 
significance of each candidate pattern was determined by 
empirically estimating its q-value[42], which is the minimum 
False Discovery Rate (FDR) at which the test may be called 
significant [4 3]. Specifically, a set of 10,000 simulated results were 
generated by randomizing the tumor that each insertion appears 
in while preserving the overall set of insertion locations and the 
number of insertions in each tumor. The q-value for each 
candidate pattern was calculated as the percent of simulated 
results that had a p-value better or equal to the p-value of the 
candidate pattern divided by the percent of real patterns with a p- 
value equal to or better than the candidate pattern. The gene 
patterns with q-value S25% were deemed significant. 

Statistics 

The Cancer Gene Census and the COSMIC database were 
downloaded on 4/20/2013 from Sanger Institute website (ftp:// 
ftp.sanger.ac.uk/pub/CGP/cosmic/data_export). A custom Perl 
script was used to extract haematopoietic_and_lymphoid_tissue 
mutations from the CosmicMutantExportIncFus_v64_270313.tsv 
file. The list of mutations in AML were derived from supplemental 
tables in the TCGA report on AML[44] combining the list of Tier 
1 mutated genes with the list of fusion genes. Significance of 
association was computed using a 2-tailed Fisher's Exact Test [45]. 
MAPK pathway significance was determined by performing 
10,000 iterations of randomly assigning mouse genes to libraries 
and calculating number of libraries with insertions in MAPK 
pathway genes using a custom perl script. 



Results 

SB Insertional Mutagenesis Promotes Histiocytic Sarcoma 
Formation 

To perform a forward genetic screen for HS we generated mice 
harboring three elements required for activating SB mutagenesis 
in myeloid lineage cells. The first element was a nuclear localized 
Cre recombinase gene knocked into the myeloid-specific Li>z2 
locus [46] (Fig SI -A). The Lyz2 promoter is expressed in 
granulocytes, macrophages, and splenic dendritic cells[33,47]. 
The second element was a conditional SB 11 transposase allele 
created by inserting a Lox-STOP-Lox-SBl 1-cDMA construct down- 
stream of the ubiquitous Rosa2 6 promoter (Fig Sl-B) [22,23]. The 
third element was a concatamer of oncogenic SB transposons {T2/ 
One). The SB transposon consists of terminal inverted and direct 
repeats required for SB transposition and an internal promoter, 
splice donor, splice acceptors and bidirectional polyA signal. The 
transposon was designed to be capable of overexpressing or 
disrupting genes, and these transposon-induced mutations provide 
cells with a selective advantage when they occur in oncogenes or 
tumor suppressors, respectively (Fig Sl-C). The internal promoter 
within the transposon is highly active in hematopoietic stem 
cells [48] . We have shown that the SB transposon system is capable 
of generating insertional mutations leading to overexpression of 
oncogenes, overexpression of truncated genes, and disruption of 
genes [19,20,23,49]. We designed a breeding scheme (Fig S2) and 
generated 73 experimental mice carrying all three elements and 
1 1 7 littermate controls carrying only two of the three elements 
(Table SI). 

Mice were sacrificed and necropsied when they became 
moribund or at 18 months, whichever came first. Triple transgenic 
mice became moribund at a faster rate than controls, beginning 
around one year of age (Fig 1). The majority of mice had 
malignancies occurring in multiple tissues throughout the mouse 
(Fig 2). Over 75% of mice examined had symptoms of disease with 
the majority being localized to spleen, pancreas, liver, thoracic 
cavity and peritoneum. 

To (lassify the disease we prepared hematoxylin and eosin 
stain(-d slidc-s from multiple tissues from 5 1 animals (Fig 3A and 
Fig S3). Evidence of histiocytic neoplasm was visible in 33 of 51 
mice (65%). Upon examination of the neoplasms by light 
microscopy, the tumors comprised a diffuse relatively non-cohesive 
proliferation of large cells. The neoplastic cells were large, round 
to oval in shape, with focal spindUng, with large nuclei and 
abundant cytoplasm. The cytoplasm was eosinophilic, with fine 
granularity. The nuclei had vesicular chromatin, and many had 
prominent nucleoli. It is notable that some of the neoplasms have 
rather bland morphology, while others have marked pleomor- 
phism and increased mitotic activity (Fig S3, panel G). The 
neoplasms invaded surrounding adjacent tissue, including muscle, 
spleen, liver, pancreas, lung, and bowel (Fig S3). Eight of these 
tissues were further analyz(;d b)' immunohistochemistr)' using a 
panel of antibodies to further confirm histiocytic differentiation 
(Mzc2, F4/80 and Lyz) and exclude B-lineage (Pax5) or T-lineage 
(CD3) cells (Table S2 and Fig 3B-F). All eight tissues were strongly 
positive for Mac2, positive for Lyz, and negative for Pax5. Seven of 
the eight stained positive for F4/80, while three of eight were 
weakly positive for CD3; the level of CD3 staining was negligible 
in two of these and not diagnostic of T cell lineage. The 
immunophenotypic characteristics of these neoplasms in conjunc- 
tion with the morphologic features are most consistent with the 
characteristic HS that occur in mice [7]. We also performed PCR 
on DNA from these same eight tumors using primers crossing VDJ 
boundaries in both the TCRb locus and the IgH locus. Multiple 
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Figure 1. Kaplan Meier Survival Curve showing decreased survival in triple transgenic experimental animals compared to double 
transgenic controls. Significance determined using Logranl< test. 
doi:1 0.1 371 /journal.pone.0097280.g001 



bands were amplified in control tissues (Thymus for TCRb and 
spleen for IgH locus) while no bands, or only germline bands were 
amplified in seven of eight HS tumors (representative images in 
Fig 4). The morphologic, immunophenotypic and molecular data 
support that the neoplasms are histiocytic in origin and do not 
have associated B- or T- lymphoid differentiation. Thus, they are 
best characterized as HS. 

Identification of candidate driver genes and pathways in 
HS 

To find genetic drivers of HS we analyzed transposon insertions 
in 92 tumors from 36 different mice. The tumors were distributed 
among eight different anatomical locations (Table S3). We were 
able to confirm that 35 of the 92 tumors were HS based on 
histology. The remaining tumors are assumed to be HS based on 
gross pathology, but we did not have enough tissue to confirm by 
histological examination. We performed linker-mediated PGR 
(LM-PCR) on purified DNA from these tumors to amplify 
transposon-genomic fragments and then sequenced the amplicons 
using the lUumina HiSeq 2000 platform. Sequences were analyzed 
using a bioinformatics pipeline we developed called TAP- 
DANCE [36]. Approximately 13.8 million sequences were mapped 



to the genome. Redundant sequences and sequences mapping 
within 100 bases of each other were combined, resulting in 1 1,885 
non-redundant mapped regions. The depth of sequence reads 
using the lUumina platform allowed us to filter regions based on 
the number of sequence reads that mapped to the region. We 
reasoned that regions with only one or a few reads could either be 
artifacts or only present in a minority of cells, while regions with a 
larger number of reads were more likely to be present in a majority 
of tumor cells. We set a read threshold of 0.01% of total reads 
mapping in a single tumor for each region. For example, one of 
our tumors had 227,882 reads in 365 regions. Using our threshold, 
a single region would require at least 23 mapped reads to be 
included in our analysis. Of the 365 regions mapping in this 
tumor, only 90 met the threshold. Out of the 11,885 non- 
redundant regions, 1,575 unique regions met the threshold (Table 
S4). A BED formatted version of the unique regions (Table S5) is 
also provided for use with the Integrated Genome Viewer (IGV) or 
for uploading to a genome browser to analyze insertion positions 
relative to exons. This works out to approximately 17 insertions 
per tumor, with a range of 1 to 90. 

In previous screens we noted that transposon insertions 
mapping to the donor chromosome, where the original transposon 
transgene was located, constituted up to half of all the mapped 
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Figure 3. Typical morphologic and immunophenotypic char- 
acteristics of the murine histiocytic neoplasms generated by a 
forward genetic screen. All images were captured using a SOX oil 
objective. The depicted neoplasm was present near the pancreas in one 
mouse (see supplementary figures for additional morphologic charac- 
terization). A) H&E - note abundant granular cytoplasm and large 
nuclei; B) MAC2 immunostain; C) F4/80 immunostain; D) Lysozyme 
immunostain; E) CDS immunostain - single lymphocyte in lower left 
quadrant stains positively; F) PAX-5 - insert denotes on-slide positive 
control. 

doi:1 0.1 371 /journal.pone.0097280.g003 

transposon insertions[19,23,26], a phenomenon referred to as 
"local hopping". In this experiment we generated experimental 
mice irsing three different founder strains with the donor 
concatamer on different chromosomes in each of the strains 
(chrl, chr4 and chrl5). Surprisingly, in these tumors, we did not 
see a large bias of insertions in the donor concatamer. In general, 
the percentage of insertions on the donor chromosome for each of 
the three respective T2/Onc strains was 2 to 3 times higher than 
expected (Table S6) Because the insertion distribution was not 
heavily skewed towards the donor chromosome we performed four 
separate CIS analyses. The first three analyses eliminated the 
donor chromosomes (1, 4 & 15) in those respective tumors, while 
the fourth analysis included all chromosomes. Of the seven CISs 
identified on the donor chromosomes, five of the seven were stiU 
identified in the analyses even when the insertions in those 
chromosomes were excluded from the subset of tumor libraries 
with the corresponding donor concatamer. The other two CISs 
(Bach2 and AtpGvlcl) were not identified if the donor chromo- 
some was excluded, indicating they may be biased by the donor 
concatamer. All of the CISs identified in the four analyses were 
merged into a single list resulting in a final list of 27 CISs (Table 1). 

Because we could not positively diagnose all the tumors we 
sequenced via histology we performed a second analysis of CISs 
using only those tumors that had corresponding histological 
analysis confirming HS. Because there were fewer tumor libraries, 
only six CISs in this analysis were identified based on our criteria 



Figure 4. TCR and Ig genes are not rearranged in tumors. A) PGR 

amplification of TCR locus using genomic DNAfrom HS tumor (Lyz-728) 
indicates no rearrangement of TCR VDJ locus. B) PCR amplification of 
the IgH locus indicates no rearrangement of IgH DJ locus. Thymus, 
spleen, and tail DNA were from a wild-type control animal. 
doi:10.1371/journal.pone.0097280.g004 

described above. All six of these CISs {Rafl, Mitf, Nfl, Fill, Bach2, 
and Rrehl) were also present in the list of 27 CISs identified in the 
original analysis (Table 1). 

To determine the clonality of tumors arising in a single animal 
we measured the overlap between tumors from the same animal. It 
was apparent that several tumors were clonal, based on the large 
percentage of shared insertions, although the majority of tumors 
did not share transposon insertions with tumors from the same 
animal (Table S7). To eliminate the bias these clonal tumors may 
have contributed to calculating CISs we required that all CISs 
consist of tumors from at least three separate mice. As a 
conservative test, we re-calculated CISs, this time considering all 
the tumors from each animal as a single tumor. This re-calculation 
StiU identified 24 of the 27 loci, indicating tumor clonality did not 
significandy affect CIS detection. 

Manual analysis of the transposon insertion patterns in the 2 7 
genomic loci allowed us to identify 28 candidate genes, including 
two micro-RNAs, and we could predict the effect (gain- or loss-of- 
function) for 2 1 of these genes based on the location and direction 
of the transposon insertions in the gene locus (Table 1). The three 
top hits, ranked by percentage of tumors contributing to the CIS, 
were Rafl (alias C-Rafj, Bach2 and Flil. Over 25% of all tumors 
had a mutation in one of these three genes, and half of these 
tumors had mutations in at least two of the genes. 

To measure the effect of the SB transposon insertions we 
selected a small subset of the tumors where we had sufficient 
frozen tumor tissue along with a matched normal tissue to extract 
RNA and perform qRT-PCR. We selected four tumors from three 
mice and measured the expression level of four genes (Fli 1 , Nfl , 
Mitf, and Rafl) in the tumors that had insertions in these four 
genes. Based on the transposon insertion pattern we predicted that 
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Flil, Mitf, and Rafl would have gain of function mutations, while 
Nf 1 would have a loss of function. Ten of the eleven comparisons 
possible in this set of tumor/normal tissue pairs indicated that the 
mRNA level changed in the predicted manner (Fig S4). 

Network analysis and relevance to human HS 

Because Rafl was the site of transposon mutagenesis in over 
20% of tumors analyzed, we checked for transposons inserted near 
other MAPK pathway genes in tumors without a Rafl insertion. 
We found that 44 of the 92 tumors (48'y()) generated in our screen 
had a transposon insertion within 10 kb of an annotated MAPK 
pathway gene based on the KEGGpO] MAPK pathway gene list 
(Table S8). To measure the significance of this finding we analyzed 
randomly generated datasets. The average number of libraries 
with a MAPK insertion, from 10,000 randomly generated 
datasets, was 13 (st. dev. 3.0), which is significantly lower than 
the 44 libraries found in our set of tumors. 

To identify cooperating mutations we tested for associations 
between CISs using Fisher's Exact Test. After correcting for 
multiple testing, we found significant associations between Flil and 
Bach2 and between Mitf and Rafl, suggesting these pairs of 
mutations may cooperate in HS tumorigenesis, although formal 
proof of cooperation would require further experiments. Interest- 
ingly, MITF is an oncogene in melanoma, and leads to cell survival 
via upregulation of BCL2 and other moleculespl]. This suggests 
the combination of a growth factor mutation and a cell survival 
mutation may be crucial for HS development. 

We analyzed the overlap between our gene list, from which we 
could identify 25 human orthologs, and known human cancer 
genes. Ten of the 25 CIS human orthologs were in the list of 487 
cancer genes annotated in the Sanger Institute's cancer gene 
census[52] (Table S9). This is a significant overlap (Fisher's Exact 
Test p<0.00001). Although all of the 25 CIS human orthologs had 
multiple documented somatic mutations in the COSMIC data- 
base [53], the significance of this comparison is difficult to 
ascertain, as over 96% of the ~ 24,000 genes contained in 
COSMIC have documented mutations. If we limit our analysis to 
the 4,682 genes mutated in tumors classified as hematopoietic and 
lymphoid tissue in COSMIC, we find an overlap of 13 of our 25 
CIS genes (Fisher's Exact Test, p<.001) (Table S9). There are no 
HS tumors documented in the COSMIC database. We also 
compared our gene list to the recent TCGA sponsored study of 
AML, because both HS and AML derive from the myeloid 
lineage. The AML study analyzed mutations and gene-fusions in 
200 patient samples and identified 2,022 genes with mutations or 
gene fusions predicted to alter protein sequence [44] . Eleven of our 
25 CIS genes overlapped with these AML genes (Table S9), which 
would not be expected by chance (Fisher's exact test, p<0. 00001). 
These results support the hypothesis that our mouse model has 
dis( o\'ered cancer genes relevant to human cancer, and myeloid 
malignancies specifically. 

We ansdyzed networks associated with the 27 CIS human 
orthologs, including the two microRNAs using Ingenuity Pathway 
Analysis (Ingenuity Systems, www.ingenuity.com). Six of the top 
ten canonical pathways associated with our gene set were cancer 
signaling (Table SIO), while seven of the top ten functions involved 
death or proliferation of cancer cells (Table Sll). The major 
proteins contributing to these associations were MTC, RAFl, JAK2, 
and PTEN. These findings suggest that agents that target these 
signaling pathways, such as ruxolitinib or sorafenib, could be 
effective in HS patients with a matching genetic profile. 

Finally, we used a method of identifying cooperating mutations 
in our tumors that does not rely on defined CISs. Instead, we used 
an algorithm called frequent itemset mining[37,38]. The algo- 



rithm identifies combinations of insertions that frequendy co-occur 
in multiple tumors. These groups of genes can reach statistical 
significance, even though they do not reach significance as a CIS. 
Analysis of 1,575 transposon insertions (Table S4) using frequent 
itemset mining identified 38 sets of genes that were mutated in 
three or more mice, with an FDR<0.25 (Table S12). A total of 28 
genes comprise the 38 sets, with several genes appearing in 
multiple sets. The majority of the gene sets (24/38) contained three 
or more of the following genes: Pcfll, Dennd2c, Serpinfl, Ncoa2, 
Dctn4, Kif2c, Baspl and Rafl. For example, three mice had tumors 
with transposon insertions in seven of these eight genes (See 
itemset #11 in Table SI 2). These results suggest that combina- 
tions of alterations in these genes may function coordinately to 
generate HS, although functional validation will require further 
experiments. 

Discussion 

HS is a rare human neoplasm that is difficult to diagnose and 
has a poor prognosis. To understand the genetics of HS, with the 
goal of expanding treatment options for these patients, we 
conducted a forward genetic screen in mice using the Sleeping 
Beauty DNA transposon as a mutagen. The majority of mice in 
the experimental cohort developed symptoms associated with HS. 

CIS analysis identified 26 mouse protein-coding genes and two 
microRNAs that are putative drivers of HS in our model. We 
identified human orthologs for 25 of the genes, including both 
microRNAs. These candidate HS cancer genes were significantiy 
enriched for human cancer genes based on the Sanger Institute's 
cancer gene census and COSMIC database. The list was also 
enriched in genes mutated in AML based on TCGA data. The 
significant overlap between genes identified in our screen and 
known human cancer genes suggests these genes are highly 
relevant as candidate cancer genes in HS. 

The top three genes identified in our screen have been linked to 
human cancers. Rafl is part of the MAP kinase pathway and is 
important for cell fate decisions. Altered RAFl is associated with 
the development of Noonan and LEOPARD syndrome, AML, 
and pilocytic astrocytoma [54-56]. Flil is an ETS transcription 
factor and human FLU forms a fusion with EIVS in 85% of Ewing 
sarcoma patients. Interestingly, the other major EWS fusion 
partner found in Ewing sarcoma patients is ERG, another gene 
identified in our screen [5 7]. FLIl fusions have also been found in 
prostate cancer [58] and abnormal FLIl expression in AML 
patients correlates with poor prognosis [59]. BACH2, paradoxical- 
ly, is a suspected tumor suppressor in CML and Burkitts 
lymphoma[60,61]. BACH2 is activated by oxidative stress and 
can inhibit proliferation and trigger apoptosis in cell hues [6 2]. 
Based on our screen we predict Bach2 is overexpressed in HS 
tumors, suggesting oncogenic activity in these tumors via aberrant 
activation of this transcriptional repressor in myeloid cells. In 
support of this hypothesis, BACH2 is significantiy overexpressed in 
CLE and B-cell ALL [63]. Intriguingly, there are quite a few case 
reports of HS developing as a secondary cancer and/or 
morphologic variant in patients with B-cell lymphoma with 
evidence that the neoplasms are clonaUy related [64-66] suggesting 
similar genetic etiologies. BACH2 has recently been shown to be 
important for B-cell germinal center formation, where B cells 
undergo somatic hypermutation and extremely rapid prolifera- 
tion[67]. It is possible that Bach2 overexpression in HS results in a 
transcriptional change that favors rapid proliferation in these cells. 

Identif^dng effective targeted therapies for rare cancers is 
extremely difficult because it is impossible to conduct informative 
clinical trials due to the small number of patients. Our mouse 
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Table 2. Eight candidate cooperating genes in MS. 





Prcdictsd functional effect of transposon insertion 


Reference 


rCTI 1 


Gsin! Increased transcription termination 


69 


Serpinfl 


Loss: Relief from angiogenesis inhibition 


71 


Dennd2c 


Unknown: Disruption of Rab9a signaling 


70 


Kif2c 


Loss: Altered chromosomal segregation 


72 


Rafi 


Gain: Activation of MAP kinase signaling 


73 


Dctn4 


Unknown: Altered trafficking along microtubules 


74 


Ncoa2 


Gain: Altered nuclear hormone signaling 


75 


Baspl 


Unknown: Altered WT1 transcription 


76 


doi:1 0.1 371 /journal.pone.0097280.t002 



model can be used to identify potential therapeutic targets in HS. 
Both Rqfl and Mjc were identified as candidate genes in our study. 
Two case reports of cytogenetic analysis of human HS have 
identified extra copies of chromosome 8, where MYC resides, 
suggesting MYC is involved in human HS[8,9]. We found that 
over 50% of tumors in our screen had transposon insertions near 
MAPK pathway genes, suggesting that MAPK patiiway inhibitors 
or HDAC inhibitors, Vik& FK228, that significantly decrease RAFI 
levels[68] may be effective therapeutics for HS patients. 

Another possible therapeutic target for HS patients, based on 
our findings, is FLU signaling. Abnormal expression of FLU is 
associated with AML and T-ceU lymphoma[20,59], while FLLl 
flision proteins are linked to Ewing sarcoma and prostate 
cancer[58,69]. 

The "hallmarks of cancer" paradigm[70] posits that multiple 
pathways are disrupted in a single cancer. We used frequent 
itemset mining analysis of transposon insertions to identify 
multiple genes that were co-mutated in several tumors. This 
analysis identified 38 gene sets comprised of 28 genes. Analysis of 
these 38 gene sets indicates that different subsets of only eight 
genes heavily contribute to a majority of the itemsets [Pcfll, 
Dennd2c, Serpinfl, Ncoa2, Dctn4, LSf2c, Baspl and Rqfl). Based on the 
function of these eight co-occurring genes[71-78] we hypothesize 
that the combination of effects listed in Table 2 can cooperate to 
generate HS. The next step will be to directly test these 
combinations using in vitro and in vivo models where the set of 
genes are coordinately manipulated and the effect on cancer 
phenotypes is measured. 

In conclusion, we have identified several candidate genetic 
drivers of HS using a transposon-based forward genetic screen in 
mice. The genes we identified are frequentiy associated with 
human cancer, including cancers highly related to HS. These 
fmdings lay the groundwork for testing new therapeutics to treat 
this rare neoplasm that currentiy has a very poor prognosis. 

Supporting Information 

Figure SI Three elements for activating SB transposi- 
tion in myeloid cells. 

(TIF) 

Figure S2 Breeding scheme for generating experimen- 
tal animals and littermate controls. 

(TIF) 

Figure S3 Histiocytic neoplasms (HN) in multiple mice 
demonstrate involvement at multiple sites, local inva- 
sion/ destruction, and aggressive morphology. 

(TIF) 



Figure S4 Change in mRNA levels of genes with 
transposon insertions comparing tumor to matched 
normal tissue. 

(TIF) 

Table SI Number and genotype of cohorts. 

(XLSX) 

Table S2 IHG scoring for 3 markers in 8 tumors. 

(XLSX) 

Table S3 List of tumors sequenced for transposon 
insertions. 

(XLSX) 

Table S4 Non-redundant genomic regions containing 
transposon insertions (Excel version). 

(XLSX) 

Table S5 Non-redundant genomic regions containing 
transposon insertions (BED formatted text version). 

(BED) 

Table S6 Insertion distribution by donor chromosome. 

(XLSX) 

Table S7 Clonality of multiple tumors from the same 
mouse based on insertion region overlap. 

(XLSX) 

Table S8 Tumors with transposon insertions in/near 
MAPK pathway genes. 

(XLSX) 

Table S9 CIS annotations. 

(XLSX) 

Table SIO IPA Canonical Pathways. 

(XLSX) 

Table SI 1 IPA Annotated Functions. 

(XLSX) 

Table S12 Coordinately mutated genes based on fre- 
quent itemset mining. 

(XLSX) 
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